Overview

Brought to you by YData

Dataset statistics

Number of variables7
Number of observations1338
Missing cells0
Missing cells (%)0.0%
Duplicate rows1
Duplicate rows (%)0.1%
Total size in memory286.5 KiB
Average record size in memory219.3 B

Variable types

Numeric4
Categorical2
Boolean1

Alerts

Dataset has 1 (0.1%) duplicate rowsDuplicates
age is highly overall correlated with chargesHigh correlation
charges is highly overall correlated with age and 1 other fieldsHigh correlation
smoker is highly overall correlated with chargesHigh correlation
children has 574 (42.9%) zeros Zeros

Reproduction

Analysis started2025-09-05 05:52:19.582048
Analysis finished2025-09-05 05:52:21.270892
Duration1.69 second
Software versionydata-profiling vv4.16.1
Download configurationconfig.json

Variables

age
Real number (ℝ)

High correlation 

Distinct47
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39.207025
Minimum18
Maximum64
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.6 KiB
2025-09-05T08:52:21.321620image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile18
Q127
median39
Q351
95-th percentile62
Maximum64
Range46
Interquartile range (IQR)24

Descriptive statistics

Standard deviation14.04996
Coefficient of variation (CV)0.35835313
Kurtosis-1.2450877
Mean39.207025
Median Absolute Deviation (MAD)12
Skewness0.055672516
Sum52459
Variance197.40139
MonotonicityNot monotonic
2025-09-05T08:52:21.389758image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=47)
ValueCountFrequency (%)
18 69
 
5.2%
19 68
 
5.1%
50 29
 
2.2%
51 29
 
2.2%
47 29
 
2.2%
46 29
 
2.2%
45 29
 
2.2%
20 29
 
2.2%
48 29
 
2.2%
52 29
 
2.2%
Other values (37) 969
72.4%
ValueCountFrequency (%)
18 69
5.2%
19 68
5.1%
20 29
2.2%
21 28
2.1%
22 28
2.1%
23 28
2.1%
24 28
2.1%
25 28
2.1%
26 28
2.1%
27 28
2.1%
ValueCountFrequency (%)
64 22
1.6%
63 23
1.7%
62 23
1.7%
61 23
1.7%
60 23
1.7%
59 25
1.9%
58 25
1.9%
57 26
1.9%
56 26
1.9%
55 26
1.9%

sex
Categorical

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size81.1 KiB
male
676 
female
662 

Length

Max length6
Median length4
Mean length4.9895366
Min length4

Characters and Unicode

Total characters6676
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfemale
2nd rowmale
3rd rowmale
4th rowmale
5th rowmale

Common Values

ValueCountFrequency (%)
male 676
50.5%
female 662
49.5%

Length

2025-09-05T08:52:21.450546image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-09-05T08:52:21.498755image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
male 676
50.5%
female 662
49.5%

Most occurring characters

ValueCountFrequency (%)
e 2000
30.0%
m 1338
20.0%
a 1338
20.0%
l 1338
20.0%
f 662
 
9.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 6676
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 2000
30.0%
m 1338
20.0%
a 1338
20.0%
l 1338
20.0%
f 662
 
9.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 6676
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 2000
30.0%
m 1338
20.0%
a 1338
20.0%
l 1338
20.0%
f 662
 
9.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 6676
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 2000
30.0%
m 1338
20.0%
a 1338
20.0%
l 1338
20.0%
f 662
 
9.9%

bmi
Real number (ℝ)

Distinct548
Distinct (%)41.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30.663397
Minimum15.96
Maximum53.13
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.6 KiB
2025-09-05T08:52:21.547798image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum15.96
5-th percentile21.256
Q126.29625
median30.4
Q334.69375
95-th percentile41.106
Maximum53.13
Range37.17
Interquartile range (IQR)8.3975

Descriptive statistics

Standard deviation6.0981869
Coefficient of variation (CV)0.19887513
Kurtosis-0.050731531
Mean30.663397
Median Absolute Deviation (MAD)4.18
Skewness0.28404711
Sum41027.625
Variance37.187884
MonotonicityNot monotonic
2025-09-05T08:52:21.610496image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
32.3 13
 
1.0%
28.31 9
 
0.7%
30.495 8
 
0.6%
30.875 8
 
0.6%
31.35 8
 
0.6%
30.8 8
 
0.6%
34.1 8
 
0.6%
28.88 8
 
0.6%
33.33 7
 
0.5%
35.2 7
 
0.5%
Other values (538) 1254
93.7%
ValueCountFrequency (%)
15.96 1
 
0.1%
16.815 2
0.1%
17.195 1
 
0.1%
17.29 3
0.2%
17.385 1
 
0.1%
17.4 1
 
0.1%
17.48 1
 
0.1%
17.67 1
 
0.1%
17.765 1
 
0.1%
17.8 1
 
0.1%
ValueCountFrequency (%)
53.13 1
0.1%
52.58 1
0.1%
50.38 1
0.1%
49.06 1
0.1%
48.07 1
0.1%
47.74 1
0.1%
47.6 1
0.1%
47.52 1
0.1%
47.41 1
0.1%
46.75 1
0.1%

children
Real number (ℝ)

Zeros 

Distinct6
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.0949178
Minimum0
Maximum5
Zeros574
Zeros (%)42.9%
Negative0
Negative (%)0.0%
Memory size10.6 KiB
2025-09-05T08:52:21.660348image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile3
Maximum5
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.2054927
Coefficient of variation (CV)1.1009893
Kurtosis0.20245415
Mean1.0949178
Median Absolute Deviation (MAD)1
Skewness0.93838044
Sum1465
Variance1.4532127
MonotonicityNot monotonic
2025-09-05T08:52:21.702831image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 574
42.9%
1 324
24.2%
2 240
17.9%
3 157
 
11.7%
4 25
 
1.9%
5 18
 
1.3%
ValueCountFrequency (%)
0 574
42.9%
1 324
24.2%
2 240
17.9%
3 157
 
11.7%
4 25
 
1.9%
5 18
 
1.3%
ValueCountFrequency (%)
5 18
 
1.3%
4 25
 
1.9%
3 157
 
11.7%
2 240
17.9%
1 324
24.2%
0 574
42.9%

smoker
Boolean

High correlation 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
False
1064 
True
274 
ValueCountFrequency (%)
False 1064
79.5%
True 274
 
20.5%
2025-09-05T08:52:21.745337image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

region
Categorical

Distinct4
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size86.4 KiB
southeast
364 
southwest
325 
northwest
325 
northeast
324 

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters12042
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsouthwest
2nd rowsoutheast
3rd rowsoutheast
4th rownorthwest
5th rownorthwest

Common Values

ValueCountFrequency (%)
southeast 364
27.2%
southwest 325
24.3%
northwest 325
24.3%
northeast 324
24.2%

Length

2025-09-05T08:52:21.788900image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-09-05T08:52:21.844053image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
southeast 364
27.2%
southwest 325
24.3%
northwest 325
24.3%
northeast 324
24.2%

Most occurring characters

ValueCountFrequency (%)
t 2676
22.2%
s 2027
16.8%
o 1338
11.1%
h 1338
11.1%
e 1338
11.1%
u 689
 
5.7%
a 688
 
5.7%
w 650
 
5.4%
n 649
 
5.4%
r 649
 
5.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 12042
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 2676
22.2%
s 2027
16.8%
o 1338
11.1%
h 1338
11.1%
e 1338
11.1%
u 689
 
5.7%
a 688
 
5.7%
w 650
 
5.4%
n 649
 
5.4%
r 649
 
5.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 12042
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 2676
22.2%
s 2027
16.8%
o 1338
11.1%
h 1338
11.1%
e 1338
11.1%
u 689
 
5.7%
a 688
 
5.7%
w 650
 
5.4%
n 649
 
5.4%
r 649
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 12042
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 2676
22.2%
s 2027
16.8%
o 1338
11.1%
h 1338
11.1%
e 1338
11.1%
u 689
 
5.7%
a 688
 
5.7%
w 650
 
5.4%
n 649
 
5.4%
r 649
 
5.4%

charges
Real number (ℝ)

High correlation 

Distinct1337
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13270.422
Minimum1121.8739
Maximum63770.428
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.6 KiB
2025-09-05T08:52:21.912148image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum1121.8739
5-th percentile1757.7534
Q14740.2872
median9382.033
Q316639.913
95-th percentile41181.828
Maximum63770.428
Range62648.554
Interquartile range (IQR)11899.625

Descriptive statistics

Standard deviation12110.011
Coefficient of variation (CV)0.91255659
Kurtosis1.6062987
Mean13270.422
Median Absolute Deviation (MAD)5018.7571
Skewness1.5158797
Sum17755825
Variance1.4665237 × 108
MonotonicityNot monotonic
2025-09-05T08:52:21.977408image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1639.5631 2
 
0.1%
16884.924 1
 
0.1%
29330.98315 1
 
0.1%
2221.56445 1
 
0.1%
19798.05455 1
 
0.1%
13063.883 1
 
0.1%
13555.0049 1
 
0.1%
44202.6536 1
 
0.1%
10422.91665 1
 
0.1%
7243.8136 1
 
0.1%
Other values (1327) 1327
99.2%
ValueCountFrequency (%)
1121.8739 1
0.1%
1131.5066 1
0.1%
1135.9407 1
0.1%
1136.3994 1
0.1%
1137.011 1
0.1%
1137.4697 1
0.1%
1141.4451 1
0.1%
1146.7966 1
0.1%
1149.3959 1
0.1%
1163.4627 1
0.1%
ValueCountFrequency (%)
63770.42801 1
0.1%
62592.87309 1
0.1%
60021.39897 1
0.1%
58571.07448 1
0.1%
55135.40209 1
0.1%
52590.82939 1
0.1%
51194.55914 1
0.1%
49577.6624 1
0.1%
48970.2476 1
0.1%
48885.13561 1
0.1%

Interactions

2025-09-05T08:52:20.963361image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2025-09-05T08:52:19.669898image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2025-09-05T08:52:20.571368image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2025-09-05T08:52:20.782950image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2025-09-05T08:52:21.009602image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2025-09-05T08:52:20.359264image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2025-09-05T08:52:20.641475image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2025-09-05T08:52:20.822679image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2025-09-05T08:52:21.058439image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2025-09-05T08:52:20.407892image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2025-09-05T08:52:20.689150image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2025-09-05T08:52:20.867327image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2025-09-05T08:52:21.128600image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2025-09-05T08:52:20.475694image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2025-09-05T08:52:20.735201image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2025-09-05T08:52:20.911125image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Correlations

2025-09-05T08:52:22.023419image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
agebmichargeschildrenregionsexsmoker
age1.0000.1080.5340.0570.0000.0000.043
bmi0.1081.0000.1190.0160.1640.0000.000
charges0.5340.1191.0000.1330.0650.0630.832
children0.0570.0160.1331.0000.0000.0000.038
region0.0000.1640.0650.0001.0000.0000.057
sex0.0000.0000.0630.0000.0001.0000.069
smoker0.0430.0000.8320.0380.0570.0691.000

Missing values

2025-09-05T08:52:21.188667image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
A simple visualization of nullity by column.
2025-09-05T08:52:21.245046image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

agesexbmichildrensmokerregioncharges
019female27.9000yessouthwest16884.92400
118male33.7701nosoutheast1725.55230
228male33.0003nosoutheast4449.46200
333male22.7050nonorthwest21984.47061
432male28.8800nonorthwest3866.85520
531female25.7400nosoutheast3756.62160
646female33.4401nosoutheast8240.58960
737female27.7403nonorthwest7281.50560
837male29.8302nonortheast6406.41070
960female25.8400nonorthwest28923.13692
agesexbmichildrensmokerregioncharges
132823female24.2252nonortheast22395.74424
132952male38.6002nosouthwest10325.20600
133057female25.7402nosoutheast12629.16560
133123female33.4000nosouthwest10795.93733
133252female44.7003nosouthwest11411.68500
133350male30.9703nonorthwest10600.54830
133418female31.9200nonortheast2205.98080
133518female36.8500nosoutheast1629.83350
133621female25.8000nosouthwest2007.94500
133761female29.0700yesnorthwest29141.36030

Duplicate rows

Most frequently occurring

agesexbmichildrensmokerregioncharges# duplicates
019male30.590nonorthwest1639.56312